Introduction to Video Streaming
Learn the process of video streaming and the relevant terminologies.
Introduction#
Before we further explore details on streaming services, let's first understand the different terminologies that are a part of the video streaming ecosystem. This lesson discusses the evolution, underlying concepts, specifications, frameworks, and protocols required to set a foundation for designing the API of streaming services like YouTube.
This lesson mainly consists of three sections that discuss the relevant terminologies in media streaming, the evolution of streaming, and the working mechanism of streaming. Let's start with the relevant terminologies in the following section.
Terminologies involved in media streaming#
Let’s understand the different terms in the media streaming system.
-
Encoding: This represents data in bits so that any redundant information is removed. An encoder will take the raw media as input at the sender’s end to compress it, whereas a decoder takes the encoded file as input at the receiver’s end to decompress it. Encoding makes it possible to send large media files over the Internet. However, the drawback is that it can result in loss of media file quality. Such encoding techniques are called lossy, although lossless algorithms also exist where the input to encoding and the output of decoding are identical to the last bit.
There are different audio and video encoding algorithms. For example, H.264 (also called AVC), H.265 (also HEVC), VP9, and AV1 are popular video encoding algorithms, whereas MP3, AAC, Dolby AC-3, etc., are well-known audio encoding algorithms.
There are a ton of audio and video formats that were created over time. Also, there are different formats used for running different applications. Here, we list a few reasons why so many formats were created:
- Some formats have technical limitations to be used for a specific reason. For example, on resource constrained devices, it’s not easy to store the AVI video format because it takes up a lot of storage space.
- Some formats may suit our needs but cannot be employed because of legal issues. For instance, the format’s creator may not allow others to use it freely.
- Using specific formats can benefit our application, but they may have support issues. For instance, some newer formats would have no support on common browsers.
Other issues can include high resource consumption (like memory, processing power, etc.), the complexity of the program, flexibility, and so on. Due to these issues, different creators (individuals or vendors) developed different formats. Depending on our needs, we should choose the format that suits our requirements best.
Transcoding: It is not enough to compress a media file so that it is ready for playback on all end devices. The client-side available bandwidth can vary quite a bit through the day, and it might not be possible to get the highest quality streaming video all the time. Falling back to a lower resolution can be a graceful degradation of the user's quality of experience. For such cases, transcoding is performed, where the uploaded raw video is converted into various formats. Transcoding is used to make the content suitable for playback on various end devices. As shown below, the encoded media is converted into various formats. A disadvantage of transcoding is that the same raw media file has to be stored in various formats on the server side. On the bright side, we can do adaptive streaming to make amends with dynamic network requirements and various screen resolutions.
Segmentation: This involves breaking down a transcoded video into small chunks with varying time durations. Each of these chunks is generally referred to as a segment. Typically, a segment is between two to ten seconds. Using segmentation, there is no requirement to download the entire media file at once. Segmentations require the client to download a manifest file (more on this later) to obtain the list of available segments of a media file. In the illustration below, each segment is represented as a box of different formats that are presented in different colors (green, yellow, and pink).
Adaptive bitrate (ABR): This is the ability to change the streaming media quality (that is, the bitrate) depending on the network dynamics and screen resolution. ABR makes it possible to change the quality of the content during the playback. Media segments of differing quality can be created at the content provider's end and sent to the receiving party, depending on the network bandwidth.
Quiz
(True or False) It is possible to perform ABR without performing transcoding first.
True
False
It is not possible to perform ABR because transcoding enables us to generate multiple bitrates.
Buffering: This refers to downloading a certain amount of data before its playback. Buffering occurs when the play rate exceeds the download rate of the media file. Using ABR reduces the frustration of buffering delay. In media streaming, we make our best efforts to keep buffering delay to a bare minimum.
Point to Ponder
Question
How much data should be buffered to avoid streaming lag or playback stall?
The answer to this question depends upon several factors. Ideally, the buffer size allocation should be large enough to support a smooth playback. Of course, the playback device will need a good RAM size to support a large buffer size. The amount of buffered media should be balanced so that it neither overflows the buffer nor falls too low. If it overflows, the media will have to be sent back, whereas, if little content is buffered, there is a chance of streaming lag.
Therefore, we must define some thresholds for the available buffers to avoid overflow or lag. To achieve this behavior, an algorithm will be devised in the player on the client side that will request media segments depending on the latency of segment arrival and availability of buffer space. For example, in the figure below, the lower boundary of the threshold would initiate a request from the client to the server for segments, whereas the higher threshold boundary would halt further requests because we have enough content for playback.
Evolution of video streaming#
The evolution of video streaming services over the Internet became a possibility after powerful machines, coupled with high bandwidth, came into existence. Initially, however, the Internet's bandwidth was used to establish inexpensive voice calls because existing telephone lines proved to be expensive. This idea gave rise to applications that could make free calls over the Internet. Slowly, with the dawn of high-speed Internet, the same applications started offering video calls as their core feature. Today, video streaming is one of the most popular web services.
Streaming videos was a slow process until browsers required additional plugins for playing media. Popular streaming services, such as YouTube and Netflix, have used plugins like Adobe Flash and Microsoft Silverlight for playing videos over the years. However, as we’ll see, things have changed significantly with HTML5. Let's discuss the two epochs of media transfer via HTTP.
Progressive download: Here, the entire media file is downloaded at the client's end. In this method, the media can be considered a large web page, rather than a streaming file. Media playback is done before the entire file is downloaded because of fast start. However, buffering is still quite probable because ABR is unavailable. This is because the entire media file is downloaded as a single entity, instead of segments. Another problem with this media transfer method is that the end user can access the entire file sitting in the cache of the browser, which may not be desirable to certain streaming services because it can damage their business models. A visual depiction of the progressive download is provided below.
Adaptive bitrate streaming: In this method, the videos are played segment by segment, as shown in the following figure. Associated with each video is a manifest file, also known as the media presentation description (MPD), which has a list of all the segments with different resolutions, frame rates, frame numbers, the URL of each segment where it is stored, and so on.
Tip: It is important to understand that ABR streaming requires the client to place the request for the next chunk/segment based on its bitrate. The playback client chooses the optimal quality of the chunk using the manifest file, considering the available bandwidth.
Various streaming protocols support adaptive bitrate streaming, including HTTP live streaming (HLS), MPEG dynamic adaptive streaming over HTTP (MPEG-DASH), Web real-time communication (WebRTC), real-time messaging protocol (RTMP), and so on.
Let’s discuss some well-known streaming protocols:
HLS
This protocol is widely used for live streaming because it is robust and cost-effective. This is because it doesn’t need any additional libraries/extensions on major browsers. It provides the best possible audience experience that can produce ultra-high-quality video streaming. It is also known for secure streaming. However, it is not recommended for usage with low bandwidth because it could cause an increase in latency.
MPEG-DASH
This is an open-source protocol that can be used with almost any streaming encoding format; hence, it is codec agnostic. However, this protocol has limited support, both on clients like browsers and electronics that process video streams.
WebRTC
This open-source protocol is highly recommended for video conferencing. It also supports streaming with real-time latency that allows video to travel to the client’s screen in virtually real time. However, this protocol is rarely adopted by the industry, which might bring compatibility issues.
RTMP
This protocol is widely used for live streaming because many existing devices (like IP cameras and encoders) use it to deliver streams. Due to its limited client-side support, it has to work in tandem with HLS to deliver content to end users. Thus, live streams are delivered to streaming servers through RTMP, whereas the final delivery to clients is done by HLS.
Note: RTMP is incompatible with the HTML5 video player and requires the Flash plugin.
How does streaming work?#
Media streaming is not a straightforward concept. Our goal in this section is to simplify it by breaking down the entire process into three data management layers.
Codec: This is the first layer, where we encode and decode the video and audio data. The codecs describe the actual bit-level detail of the media file. We have already mentioned some popular video and audio encoding schemes above.
Note: Did you know that H.265, which is a successor to H.264, is much more efficient than H.264? This means that files compressed using H.265 would be roughly 25–50 percent smaller than files compressed using its counterpart.
Point to Ponder
Question 2
What is a practical example of codec being used to reduce the size of a media file?
Let’s understand the importance of codec with the following real example:
Assume that we record raw 5.1 surround audio for ten seconds with a frequency of 30,000Hz at 16 bits per second. With this setup, we get the following file size that is to be transmitted to the client.
Similarly, the size of the same ten seconds of raw video of 1920 1080 resolution is:
Here, each byte in the three bytes represents an RGB channel (one byte for Red, one byte for Green, and one byte for the Blue channel), whereas the recorded rate is 60 frames per second.
As we can see, the amount of space required to store ten seconds of audio and video is 3.707 GB, which is very large. Therefore, codecs are used to reduce the raw data size.
2 of 2
Container format: When the individual streams of audios and videos are compressed, they are wrapped into a file known as the container. The container provides a technique to interleave different media data types—for instance, audio, video, subtitles, and so on. These containers are generally known as the file format or extension. The typical examples of container formats are
.MP4,.AVI,.FLV, and so on.
Transport protocol: This is the third data management layer in transferring data from the service provider to the end user's device. These protocols define standards by which the bits in containers get transferred to end devices.
In this lesson, we discussed some relevant terminologies and the process of video streaming. In the next lesson, we’ll discuss some important decisions that will allow us to develop an effective API design.
Requirements of the YouTube API
YouTube API Design Decisions